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Abstract 

A  family  of  functions  F  that  map  [0,n]  t—  [0,  n],  is  said  to  be  h- wise  independent  if 
any  h  points  in  [0,  n\  have  an  image,  for  randomly  selected  /  e  F,  that  is  uniformly 
distributed.  This  paper  gives  both  probabilistic  and  explicit  randomized  constructions 
of  ne-wise  independent  functions,  e  <  1,  that  can  be  evaluated  in  constant  time  for 
the  standard  random  access  model  of  computation.  Simple  extensions  give  comparable 
behavior  for  larger  domains.  As  a  consequence,  many  probabilistic  algorithms  can  for 
the  first  time  be  shown  to  achieve  their  expected  asymptotic  performance  for  a  feasible 
model  of  computation. 

This  paper  also  establishes  a  tight  tradeoff  in  the  number  of  random  seeds  that  must 
be  precomputed  for  a  random  function  that  runs  in  time  T  and  is  /i-wise  independent. 


Categories  and  Subject  Descriptors:  E.2  [Data  Storage  Representation]:  Hash-table  repre¬ 
sentation;  F.1.2  [Modes  of  Computation]:  Probabilistic  Computation;  F2.3  [Tradepffs  among 
Computational  Measures];  F.2.1  [Computation  in  finite  fields];  G.3  [Probability  and  Statis¬ 
tics]:  Random  number  generation. 
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1.  Introduction 

Many  probabilistic  algorithms  and  data  structures  have  been  proven  to  work  well  when 
fully  random  functions  are  used  as  unit  time  subroutines.  For  example,  in  the  case  of 
uniform  hashing  and  double  hashing,  the  expected  cost  for  inserting  the  (cm  +  l)-st  item 
into  a  table  of  size  n  has  been  shown  to  be  jY_  +  o(l)  probes  [6]  and  [7].  Moreover,  this 
cost  has  been  shown  to  be  optimal  for  that  genre  of  data  access  [24],  Yet  the  significance 
of  these  performance  bounds  for  real  computation  is  by  no  means  clear.  The  difficulty 
is  that  they  have  been  proven  for  hash  functions  that  are  assumed  to  be  fully  random. 
If,  for  example,  we  wish  to  hash  data  from  [l,n2]  into  [l,n],  then  there  are  nn 2  different 
functions  that  can  perform  such  a  mapping,  and  the  program  length  of  such  a  function 
must  be  about  n2  log  n  bits  on  the  average.  Such  functions  are  much  larger  than  the 
hash  table  they  are  intended  to  service. 

On  the  other  hand,  results  based  upon  full  randomness  sometimes  translate  into  av¬ 
erage  case  performance  guarantees  for  real  computation.  In  the  case  of  double  hashing, 
for  example,  which  requires  2  log  n  random  bits  per  hash  key,  we  may  take  these  bits 
to  be  any  fixed  portion  of  the  key  itself,  provided  it  is  at  least  21ogn  bits  long.  Then 
the  probabilistic  bound  holds  as  an  average  case  analysis.  Just  what  to  do  for  smaller 
number  ranges  is  less  clear.  For  uniform  hashing,  which  requires  additional  randomness, 
the  question  of  how  to  interpret  a  probabilistic  upper  bound  on  performance  is  even 
more  problematic.  Yet  even  where  such  average  case  results  are  meaningful,  we  would 
rather  establish  randomized  performance  bouncls-which  hold,  on  average,  for  any  set  of 
data-insteacl  of  a  bound  that  cannot  be  applied  to  any  fixed  instance  of  data. 

It  is  also  worth  noting  that  there  are  different  kinds  of  probabilistic  algorithms. 
Some  require  streams  of  disposable  random  bits,  and  a  successful  computation  can  be 
verified  without  their  retention.  In  this  case,  there  may  be  little  need  to  store  the 
random  choices  apart  from  concerns  about  reproducing  the  exact  computation,  and  no 
need  to  have  high  speed  random  access  for  such  data.  Other  algorithms  may  require 
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only  a  moderate  number  of  random  bits,  which  can  be  readily  stored  and  accessed.  For 
these  applications,  it  may  be  sufficient  to  postulate  the  existence  of  a  random  bit  source. 
Precomputed  random  strings  might  be  streamed  sequentially  from  secondary  storage, 
and  it  is  even  conceivable  that  a  source  of  quantum  mechanical  uncertainty  could  be 
used  to  generate  the  bits  on  the  fly,  provided  its  capacity  and  degree  of  randomness 
were  adequate  for  the  task  at  hand. 

Other  kinds  of  computations  are  based  upon  random  functions,  where  some  domain 
is  mapped  into  a  range  according  to,  say,  a  uniform  distribution,  and  the  mapping  of 
specific  values  must  be  recalled  at  periodic  instances.  This  form  of  computation  is  less 
forgiving  since  the  random  decisions  must  be  recalled.  If  the  computation  is  dominated 
by  such  calculations,  then  the  speed  of  the  calculations  may  be  important  as  well.  In 
hashing,  for  example,  random  functions  are  used  to  locate  items  in  a  search  table  for 
subsequent  retrieval.  Ideally,  the  mapping  of  an  item  to  a  probe  location  should  be  done 
in  constant  time.  Additional  considerations  include  the  storage  allocated  for  the  hash 
computation  and  the  load  or  fraction  of  storage  that  can  be  occupied  by  data  before 
the  search  performance  becomes  unacceptable.  For  large  scale  parallel  computation, 
a  random  function  might  be  shared  by  a  large  number  of  processors,  and  its  program 
size,  therefore,  might  be  required  to  comprise  a  negligible  percentage  of  local  memory; 
it  might  also  be  required  to  exhibit  high  degrees  of  randomness,  and  to  have  a  long 
expected  lifetime  before  probabilistic  events  occur  that  require  its  replacement  (with 
new  random  seeds). 

Carter  and  Wegman  introduced  universal  hash  functions  [3]  and  thereby  provided  a 
theoretical  framework  to  formalize  methods  that  exploit  actual  hash  functions  exhibiting 
fixed  degrees  of  freedom.  Related  works  [22],  [11]  have  sometimes  required  a  little  more 
limited  randomness,  which  is  usually  formalized  along  the  following  lines: 

Definition  1. 

A  family  of  hash  functions  F  with  domain  D  and  range  R  is  (h.  p)-wise  independent 
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if  V  yi, . . .  yh  e  if,  V  distinct  ;t  | .  ;t\>. . . .  J'/,  e  D  : 

K/  e  F  :  f{xi)  =  yt,  i  =  1,2, . . . ,  h}|  < 

Thus  the  distribution  of  a  random  /  e  F  on  any  h  points  is  nearly  uniform,  and 
(h,/t)-wise  independence  implies  wise  independence  for  j  <  h.  For  expositional 

simplicity,  we  will  frequently  suppress  the  p  parameter  and  simply  refer  to  (h)-wise 
independence. 

The  limited  randomness  provided  by  such  classes  is  frequently  sufficient  to  achieve 
an  expected  performance  for  many  randomized  algorithms  that  is  equivalent  to  the 
use  of  fully  random  hash  functions.  For  example,  recent  randomized  routing  schemes 
for  size  n  Omega  networks  have  been  proven  to  give  optimal  expected  performance 
(up  to  constant  factors),  given  a  random  fi(log ?z)-wise  independent  hash  function  ([5], 
[13]).  The  hash  functions  used  to  date  have  typically  been  polynomials  of  degree  /Slog n 
defined  on  finite  fields. 

In  particular,  Carter  and  Wegman  exhibited  the  universal  classes  of  (h)-wise  inde¬ 
pendent  hash  functions  that  map  [0,  m  -  1]  i — [0,  n  -  1]: 

F(h)  =  if  I  /(#)  =  (  J2  aJxJ  mod  P)  mod  n’  a:i  e  [°,P-  1]},  l1) 

0  <j<h 

where  p  >  m  is  prime.  They  showed  that  if,  for  any  set  S  c  [0,p  —  1],  a  hash  function 
is  randomly  selected  from  F/h\  (independent  of  S ),  to  bucket  hash  S  into  the  n  buckets 
[0,  n  -  1],  then  the  sum  of  the  expected  jr-th  moments  of  the  bucket  populations  is  es¬ 
sentially  the  same  as  that  resulting  from  fully  random  functions,  for  j  <  h.  For  bucket 
hashing  with  separate  chaining,  the  second  moment  of  the  expected  chain  lengths  (i.e. 
bucket  populations)  determines  the  expected  retrieval  time,  whence  pairwise  indepen¬ 
dence  guarantees  optimal  expected  performance. 

In  the  case  of  randomized  routing  on  ?z-node  bounded  degree  graphs,  the  O(log?^) 
cost  for  each  memory  reference  hashed  by  a  function  from  T) giog„)  is  readily  subsumed 
by  the  fi(log  n )  delay  in  routing  the  data  ( [5],  [13]).  Recently,  O(log  ?z)-wise  independent 
hash  functions  have  also  been  shown  to  give  optimal  expected  probe  performance  for 
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double  hashing  ([16]).  But  this  efficiency  is  only  in  terms  of  probe  counts;  the  cost  to 
compute  a  single  hash  address  is  c  log  n.  given  the  hash  functions  developed  to  date. 
Thus  the  formal  results  of  [16]  are  that  dictionaries  can  be  accessed  with  O(logrc) 
computations  per  data  operation,  which  is  hardly  surprising.  For  PRAM  emulation 
on  Omega  networks  (c.f.  [5],  [13),  there  would  seem  to  be  limited  opportunity  to 

exploit  pipelining  to  mask  latency  for  read  intensive  algorithms,  as  long  as  each  address 
computation  requires  the  evaluation  of  a  polynomial  having  log  n  degree.  Optimal 
speedup  would  appear  to  be  beyond  reach,  even  in  theory. 

Accordingly,  it  is  reasonable  to  ask, 

Is  there  an  inherent  logn  penalty  for  computing  such  hash  functions, 

or  can  we  do  better? 

This  paper  shows  how  to  trade  the  time  complexity  of  (h)-wise  independent  hash 
functions  for  the  number  of  random  bits  provided  to  it,  and  gives  a  mechanism  for 
computing  (n^)-wise  independent  functions  in  0(1)  time  from  ne  random  words,  for 
any  fixed  e  <  1,  and  suitably  fixed  8  depending  on  e  and  the  word  size  of  the  domain. 
More  precisely,  the  tradeoff  is  between  the  evaluation  time  for  function  evaluation  and 
the  workspace  plus  precomputation  that  is  needed  to  provide  a  pool  of  random  words. 
The  actual  number  of  random  seeds  required  for  the  computation  is  6(h) ,  which  is 
optimal. 

Moreover,  we  establish  a  tight  T-S-h  tradeoff  among  the  requisite  number  of  probes 
T  to  a  pool  of  S  random  seeds  and  the  amount  of  independence  h  exhibited  by  the  family 
of  random  hash  so  constructed.  The  actual  construction  simply  combines  the  probed 
data  values  with  the  “Exclusive- Or”  operator  and  uses  twice  the  number  of  probes 
proved  necessary  by  the  lower  bound.  While  the  question  of  how  to  compute  these 
probe  sequences  effectively  is  still  open,  we  show  that  constant  time  random  families 
with  very  high  independence  are  programmable,  for  a  constant  that  is  exponential  in 
the  time  predicted  by  a  nonuniform  model  of  computation. 
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An  immediate  consequence  of  these  constructions  is  that  double  hashing  using  these 
universal  functions  has  (constant  factor)  optimal  performance  in  time,  for  load  factors 
bounded  below  1.  Another  consequence  is  that  a  T-time  PRAM  algorithm  for  nlogn 
processors  (and  nk  memory)  can  be  emulated  on  an  ?t-processor  machine  interconnected 
by  an  n  x  logn  Omega  network  with  a  multiplicative  penalty  for  total  (non-switching) 
work  that,  with  high  probability,  is  only  0(1);  optimal  speedup  is  achieved. 

The  paper  is  organized  as  follows.  Section  2  presents  three  random  function  con¬ 
structions,  from  probabilistic  (i.e.,  nonconstructive)  but  extremely  efficient,  to  pro¬ 
grammable  (with  code).  They  all  run  in  constant  time  and  are  effectively  (rA)-wise 
independent,  for  different,  suitably  small  fixed  8  >  0.  Section  3  gives  a  lower  bound  to 
show  that  the  first  construction  is  optimal,  in  terms  of  the  number  of  random  words 
that  are  used  per  function  evaluation.  Section  4  gives  a  few  applications  while  Section 
5  presents  the  conclusions  and  the  main  open  question  concerning  these  hash  functions. 

2.  The  hash  function 

The  motivating  question  leading  to  our  constructions  is  based  on  the  following  simple 
observation.  Given  h  random  elements  from  domain  [0,  m  —  1],  these  coefficients  can  be 
used  as  in  equation  (1)  to  construct  an  (h)-wise  independent  hash  function.  Evidently, 
evaluation  requires  0(h)  time.  If  m  random  elements  are  provided,  then  table  lookup 
gives  an  0(1)  time  function.  What  sort  of  random  functions  can  be  constructed  from 
n€  random  seeds? 

For  the  purposes  of  this  paper,  we  might  view  the  physical  storage  as  size  n  within 
a  virtual  address  space  of  rA,  and  take  nf  to  be  an  acceptable  portion  of  space  to 
allocate  for  random  function t  computation.  Our  underlying  model  of  computation  is 
the  Random  Access  Machine  where  both  memory  access  and  the  basic  arithmetic  and 
logic  operations  can  be  executed  on  words  in  unit  time  (c.f. [1]). 

1  The  performance,  on  the  other  hand,  only  gets  better  if  more  space  is  available  for  the  hashing 
operation. 
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We  temporarily  suppress  the  issue  of  program  size  and  construct  a  family  of  fast 
highly  independent  hash  functions  that  map  [0,  nr  —  1]  onto  [0,  nk  -  1]  and  use  ne  words 
of  random  input.  We  are  also  suppressing  the  issue  of  domain  size.  The  reason  this  can 
be  done  is  well  known:  a  simple  linear  congruence  hash  function  can  be  used  to  map 
any  fixed  set  of  s  <  n  elements  from  a  large  domain  D  =  [0,  m]  into  the  small  domain 
[0,  nk],  so  that  the  probability  no  collisions  occur  is  at  least  1  -  £_2  .  Such  mappings  can 
be  pieced  together  from  techniques  in  [3],  [9]  and  [4],  The  details  of  this  construction 
are  in  Appendix  1. 

Consequently,  this  space  reducing  prehashing  step  has  only  a  minimal  impact  on 
the  performance  of  the  resulting  hash  functions.  We  may  formalize  this  fact  as  follows. 

Definition  2. 

A  family  of  hash  functions  F  with  domain  D  and  range  R  is  r -practical  p)-wise 
independent  if  for  any  subset  S  e  D,  with  |5|  <  n1  3 F  c  F  :  \F  /’’|  <  \F\/\R\r  and 
V  yi, . . .  yh  e  R,  V  distinct  xl7x2, . .  .xh  e  S  : 

J|L  <  K/ E  C :  )=!/..<:  =  1,2,. 

The  real  point  of  this  definition  is  to  quantify  the  performance  of  good  hash  func¬ 
tions  that  are  constructed  by  a  randomized  algorithm,  which  might  include  a  prehashing 
step  that  randomly  selects  a  prime  p  «  nC  For  most  applications,  it  suffices  to  have 
almost  all  hash  functions  exhibit  the  collective  randomness  that  is  desired.  If  a  ran¬ 
domly  selected  function  is  from  a  poorly  behaved  subset,  we  can,  depending  on  the 
underlying  process  at  hand,  attribute  a  cost  of  for  using  it,  where,  say,  1  <  r.  Then 
the  expected  performance  penalty  for  these  bad  functions  is  a  negligible  The 

reason  that  our  hash  functions  are  defined  on  [0,  rF  -  1]  for  fixed  but  unspecified  k  is  to 
expose  the  tradeoffs  in  computational  resources  and  residual  errors  such  as  that  caused 
by  the  prehashing  step  that  contracts  our  domain  to  a  polynomial  size.  The  bound  on 
the  distribution  for  F  is  stated  in  a  two-sided  form  to  facilitate  inclusion-exclusion  cal¬ 
culations.  Such  formulations  hold  for  most  families  of  universal  hash  functions  defined 
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to  date,  although  the  original  definitions  of  (h)-wise  independence  have  not  required  it. 
As  for  the  factor  //.  our  basic  constructions  satisfy  the  criterion  for  //  =  1.  as  do  the 
degree  h  -  1  polynomials  taken  mod  p. 

The  crux  of  the  problem,  then,  is  to  construct  (h)- wise  independent  maps  from 
[0,p-  1]  onto  [0,p-  1],  for  a  fixed  prime  p  >  nk  (or  where  p  is  a  power  of  2).  If 
the  ultimate  intended  range  is  smaller,  say  [0,n  -  1],  then  a  final  postprocessing  that 
computes  the  outcome  mod  n  will  give  the  result  with  a  small  number  of  hash  functions 
that  skew  onto  [0,pmod  ??]  those  data  items  provisionally  mapped  into  [ p  -  n[^\,p-  1], 
This  final  skewing  increases  in  p  by  a  factor  of  (1  +  n/p)h ,  which  is  quite  modest  for, 
say,  p  >  hn 2. 

We  shall  restrict,  for  the  moment,  the  problem  to  constructing  fully  (h,  l)-wise 
independent  hash  functions  that  map  [0,p—  1]  onto  [0,p-  1],  given  an  auxiliary  pool 
of  about  ne  random  (logp)-bit  words,  for  some  e  <  1,  and  fixed  prime  p  »  nk .  Now 
any  random  hash  function  must  have  a  mechanism  that  associates  each  element  in 
[0,p—  1]  with  a  few  of  these  random  words,  as  otherwise  no  random  computation  can 
result.  If  the  association  is  deterministic,  then  it  can  be  represented  by  a  bipartite 
graph  G  on  the  vertex  setst  [0 ,p—  1]  and  [0 -  1]).  Moreover,  such  a  bipartite  graph 
must  associate  at  least  /  random  numbers  with  each  set  of  /  elements  from  [0,p-  1], 
for  /  <  h ,  as  otherwise  there  are  not  enough  degrees  of  freedom  to  achieve  (h)-wise 
independence.  According  to  Hall’s  Theorem,  which  is  also  known  as  the  Marriage 
Theorem,  this  criterion  is  equivalent  to  every  subset  of  h  elements  in  [0 ,p  -  1]  having 
a  matching  in  the  graph  with  its  neighbors,  which  comprise  a  small  subset  contained 
within  the  p€  words.  Suitable  graphs  are  formalized  as  follows. 

Definition  3. 

Let  a  (p,  e,  d,  h)-weak  concentrator  be  a  bipartite  graph  on  sets  of  vertices  /  (inputs) 
and  O  (outputs),  where  |/|  —  p,  \0\  —  p€,  and  the  following  hold.  Each  input  has 


tot  course  the  graph  could  be  defined  with  the  first  vertex  set  restricted  to  just  [0,nA'  —  1]. 
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outdegree  cl.  Any  set  of  h  inputs  has  edges  that  achieve  a  matching  with  some  h 
outputs. 

Our  next  observation  is  that  these  graphs  can  be  constructed  with  very  small  out¬ 
degree  d. 


Lemma  1.  For  r  >  0,  d  =  2  + 


_  O  (r+l) 


tr+tz 

and  h  <  £-4^ — ,  (p,  e,  d.  h)-weak  concentrators  exist. 

elHUj 


Proof:  We  use  the  probabilistic  method  (c.f.  [15] )  to  estimate  the  probability  that 
a  randomly  constructed  graph  will  fail  to  meet  the  matching  criterion.  The  construction 
assigns,  to  each  node  in  I  =  [0,p  —  1],  edges  to  d  distinct  random  nodes  in  O  —  [0,pe]. 
Thus  a  matching  is  guaranteed  for  subsets  of  d  or  fewer  vertices  in  I.  For  larger 
aggregates  of  size  at  most  h.  Hall’s  Theorem  says  that  there  will  always  be  a  matching 
if  and  only  if  each  subset  of  j  <  h  vertices  in  /  has  edges  to  at  least  j  vertices  in 
O.  The  probability  that  some  such  aggregate  fails  to  have  a  matching  is  less  that  the 
expected  number  of  such  subsets  that  fail  Hall’s  criterion,  which  is  the  expected  number 
of  subsets  in  I  of  size  j  whose  neighbors  lie  within  some  subset  of  j  -  1  vertices  in  O.  In 
particular,  the  probability  of  a  failure  is  overestimated  by  the  expected  number  of  pairs 
(S  C  1,  T  c  O),  where  |5|  =  j ,  |T|  —  j  —  1,  and  all  jd  edges  from  S  have  destinations 
within  T,  for  d  <3  <  h.  Evidently,  the  probability  that  the  jd  edges  are  so  selected, 


for  any  fixed  (A,  T),  is  ( <  (‘■^r-)jd7  and  the  number  of  candidate  (S,  T)  pairs 

V  (  d  )  / 


IS 


wst  «)(£). 


Following  this  prescription,  we  can  estimate  that  the  probability  a  randomly  gen¬ 
erated  (p.  e,  d.  h]- weak  concentrator  fails  to  be  an  (n,  e,  d.  h)-weak  concentrator  by  sum¬ 
ming  over  all  relevant  pairs  of  subsets  to  get: 
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Prffailure)  <  E 
d<j<h 

<  E 

d<j<h 


P 

/'-I 

P€ 

i-  1 


'(V)V 
,  (pl)  J 

l'Jd 

pe 


pj+j^  j2j+l+j(r+l)/e 
<  ^  j\j\p2jc+(r  +  l)j+e 

j.  e2  j  pj+jej2j+l+(r+l)j/e 

S  '  j2j+l  p2je+(r-|-l)j+e 

<p-eJ2(e23{r+1)/e/Pe+rY 

j<h 


h 


<  p-£J2(e2h{r+1)/£/pe+ry  <  E  p  <  4  <  L 


Since  the  probability  is  less  than  1  that  a  randomly  constructed  graph  fails  to  be 
a  (p,e,d,h)-weak  concentrator ,  it  follows  that  such  a  construction  will  succeed  with 
positive  probability  and  hence  these  graphs  do  indeed  exist.  | 

We  have,  as  yet,  no  hash  function;  but  each  element,  at  least,  is  now  associated  with 
a  few  random  values.  The  obvious  use  for  these  values  is  as  coefficients  of  a  hashing 
polynomial.  By  increasing  the  number  of  random  values  used  in  this  calculation,  we  can 
turn  a  weak  concentrator  into  a  calculation  procedure  for  fully  (h)-wise  independent 
hash  functions. 

Let  G  be  a  (p,  e,  d,  h  )-weak  concentrator.  For  each  input  i  in  G,  let  i’s  d  neighbors 
in  G  be  stored  in  the  set  Adj(i).  Let  Mm  be  a  p£  x  d  array  of  words  in  [0 ,p-  1],  whose 
concatenated  contents  is  rn  e  [0 ,p  —  1]^,  for  some  prime  p  >  id'1 . 

Define  the  random  hash  function 


/m(0  =  E  (mod  p). 

jeAdj(i),o<l<d 

Thus  a  computing  /^(i)  requires  d2  additions  and  d  multiplications  plus  a  comparable 
number  of  modular  divisions.  The  result  turns  out  to  be  (h)-wise  independent. 

tWe  are  using  a  simple  version  of  Stirling’s  Formula:  \fj p e  J  <  j\,  for  j  >  0.  Subsequent 
applications  will  be  denoted  by  the  annotated  inequality  sign  <s . 
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Lemma  2.  Let  G  be  a  (p.  e,  d.  /i)- weak  concentrator.  Then  {/m'}ro6[o  p-i]#e  is  an  (^b 1)- 
wise  independent  family  of  hash  functions  mapping  [0,p—  1]  ' — »■  [0,  _p  —  1] . 

Proof:  It  is  not  difficult  to  see  that  we  need  only  establish  the  linear  independence 
of  the  systems  of  equations  that  constrain  the  m  values  to  yield  arbitrarily  specified 
values  for  /,  on  any  h  inputs.  This  is  because  such  a  system  has  h  constraints  in  dpe 
unknowns.  If  the  system  enjoys  linear  independence,  then  the  null  space  has  dimension 
dp€  -  h.  and  each  set  of  h  values  will  be  attained  for  pdp(-h  Qf  t  Ik:  y/h'  sets  of  random 
words.  So  suppose  that  specifying  values  for  some  input  set  J0,  where  \I0\  <  h ,  induces 
a  minimal  dependent  linear  system.  That  is,  a  linear  combination  of  the  rows  in  the 
linear  system  with  row  indices  in  Iq  sums  to  the  zero  vector,  and  no  row  has  a  coefficient 
of  zero  in  the  linear  combination.  Now  Iq  sources  d|/o|  edges,  which  reach  at  least  \Iq\ 
outputs,  so  there  must  be  an  output  y0  e  O  having  exactly  q  edges  that  originate  in 
I0,  for  some  q  where  0  <  q  <  d.  Consider  the  linear  subsystem  with  rows  indexed 
by  Ii  —  {i  e  Iq  :  y0  e  Adj(i)}.  By  definition  of  y0,  1  <  |  —  q  <  d.  The  subsystem 
restricted  to  the  variables  JVf(;i/o,  &),  where  k  =  0, 1, . . .  d—  1  has  coefficients  that  are  the 
Vandermonde  submatrix 


/I 

H 

i 2 

h  ■ 

id-1 
.  .  lx 

1 

G 

i 2 

<2  ■  ■ 

id~  1 
■  •  l2 

u 

iq 

i 2 

lq  . 

jd-1 
■  ■  lq 

where  /j  comprises  the  distinct  rows  ( ,  *2 ,  -  -  - ,  ) •  As  is  well  known  and  easily  veri¬ 

fied,  the  determinate  of  the  matrix  (in  the  case  q  =  d  -  1)  is  rio<j</fc<g('V-  -  which 
shows  that  such  a  subsystem  cannot  be  linearly  dependent  because  no  two  rows  are 
the  same.  Since  none  of  other  rows  with  indices  in  J0  have  any  of  the  variables 
M(y0,0),  M(y0,l), . . . :  M(y0,d  -  1)  present,  the  assumption  that  the  system  is  mini¬ 
mal  and  dependent  is  contradicted.  | 

So  far,  we  have  a  probabilistic  fast  hashing  procedure  that  is  (h)-wise  independent, 
uses  dpe  random  words  of  logp  bits,  and  requires  d?  additions  and  d  multiplications 
per  evaluation.  The  construction  gives  a  generic  transformation  from  a  graph  rich  in 
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matchings  to  a  family  of  highly  random  functions.  We  now  give  a  more  efficient  con¬ 
struction  that  uses  better  random  graph  properties,  with  a  constant  factor  degradation 
in  the  independence  (or  outdegree  d).  but  where  only  one  random  value  is  stored  per 
output  destination.  The  construction  folds  a  sparse  bipartite  graph  where  every  h  rows 
of  its  adjacency  matrix  is  linearly  independent.  In  fact,  the  linear  independence  holds 
for  GF(  2),  the  field  of  integers  mod  2. 

Definition  4. 

Let  an  (n,t,d,h)- weakly  triangular  graph  be  a  bipartite  graph  on  sets  of  vertices 
/  (inputs)  and  O  (outputs),  where  |/|  =  n,  \0\  =  n€ ,  and  the  following  hold.  Each 
node  in  /  has  an  outdegree  of  at  most  d.  The  |/|  x  \0\  adjacency  matrix  of  the  graph, 
when  restricted  to  any  h  input  rows,  and  all  \0\  columns,  can  be  be  permuted  into  an 
upper  triangular  form  with  nonzero  diagonal:  suitable  row  and  column  permutations 
transform  the  h  x  n€  submatrix  S  so  that  S(i,l )  =  0,  if  i  >  l  and  S(i,i)  =  1,  for 
i  =  1,2, . . . ,  h. 

Of  course  any  (n,  e,  d,  h)- weakly  triangular  graph  is  also  (n.  e,  d,  j  (-weakly  triangular,  for 

j  <  h. 

Lemma  3.  Let  G  be  (n,  e,  d,  h .(-weakly  triangular.  For  each  input  i  in  G,  let  i's 
neighbors  in  G  be  stored  in  the  set  Adj(i).  Let  Mm  be  an  array  of  n€  words  in  [0,  n—  1], 
with  concatenated  contents  m  e  [0,  n  —  l]”e,  where  n  is  a  power  of  2.  Define  the  random 
hash  function 

/£(<)  =  XORHmt)MmV), 

where  XOR  is  the  bitwise  “Exclusive- Or”  function  (or  any  other  commutative  group 
operation  such  as  modular  addition).  Then  0  n- 1]«*  is  an  (h,  l)-wise  independent 

family  of  hash  functions  mapping  [0,  7*  —  1]  i — a-  [0,  n  —  1], 

Proof:  As  in  Lemma  2,  we  need  only  show  that  a  minimal  dependent  set  of  row 
vectors  must  comprise  h  +  1  or  more  vectors.  But  this  is  immediate,  since  the  system  is 
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easily  solved  by  the  back  substitution  step  of  Gaussian  elimination.  Given  j  equations, 
identify  a  variable  that  appears  only  once  and  first  solve  the  j  —  1  equations  that  are 
independent  of  it.  Then  the  last  equation  is  readily  solved  under  our  commutative 
group  operation  such  as  the  “Exclusive- Or” .  Since  the  number  of  solutions  to  j  such 
equations  in  /  unknowns  is  nl~^ ,  the  ( h ,  l)-wise  independence  is  ensured.  1 

We  now  show  that  some  random  graphs  are  weakly  triangular.  Later  others  will 
also  be  shown  to  have  this  property. 

Definition  5. 

Let  an  (n.t.d.h)  iviak  expander  be  a  bipartite  graph  on  sets  of  vertices  /  (inputs) 
and  O  (outputs),  where  |/|  =  ??.,  and  \0\  =  nc.  and  the  following  hold.  Each  input  has 
an  outdegree  bounded  by  d.  Any  set  of  j  inputs,  for  1  <3  <  h.  has  edges  to  at  least 
[jd/2\  +  1  different  outputs. 

Lemma  4.  An  (n,e,d,h)-weak  expander  is  (ra,  e,  d,  A)-weakly  triangular. 

Proof:  For  j  >  1,  any  j  <  h  rows  have  at  least  \jdj 2J  +  1  different  output  variables, 
whence  at  least  one  such  variable  will  appear  in  exactly  one  of  the  rows.  Permuting  this 
variable  and  row  into  location  5(1, 1)  leaves  j  —  1  rows  with  the  same  property,  whence 
recursion  completes  the  construction.  1 

Lemma  5.  For  e  >  |  +  1+1°f0g^log  h ,  (n.  e,  d.  h)-weak  expanders  exist. 

Proof:  Proceeding  as  in  Lemma  3  gives,  for  our  previous  random  construction: 

<sJ2(nl~£d/2Ude/2)d,2y/j\ 

j<h 

<  J2{e~(1+logd+logh)d/2{hde/2)d/2y /j\ 
j<h 

<^(1/2)^/2/j!<1.  I 

j<h 

Jc 

Combining  Lemmas  3  and  4  with  Lemma  5  where  (")  is  replaced  by  ("■  )  gives  the 
following. 
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Theorem  1.  For  e  >  8  +  ^-+  d ,  there  are  fixed  programs  that,  for  each  input  from 
[0,  nk  -  1],  “Exclusive- Or”  together  d  words  from  a  pool  of  nc  random  ( k  log  ??  )-bit  words 
to  compute  an  ( r? ^ ,  l)-wise  independent  hash  function  mapping  [0,n*  —  1]  i->  [0,n*  -  1], 
where  n  is  a  power  of  2.  | 

It  is  worth  observing  that  the  collections  of  ne  random  numbers  used  in  Theorem 
1  will  yield  a  family  of  (h)-wise  independent  hash  functions  if  they  are  selected  from 
a  family  of  (dh)- wise  independent  pools.  Thus  each  pool  can  be  precomputed  from 
dh  random  seeds.  Consequently  the  space-time  tradeoff,  for  families  of  fast  highly 
independent  hash  functions,  is  not  a  function  of  the  number  of  random  bits  that  must 
be  specified  (which  is  essentially  its  Kolmogorov  complexity)  but  is  really  matter  of 
intrinsic  storage  requirements. 

This  second  construction  gives  a  more  efficient  family  of  hash  functions,  and  again 
provides  a  generic  procedure  that  turns  a  good  graph  into  a  family  of  hash  functions. 
It  does  not  quite  supersede  the  first  construction  because  there  are  no  known  explicit 
graphs  of  either  type.  Should  a  short  (deterministic  or  effective  probabilistic)  algorith- 
m  be  found,  which  builds  weak  concentrators  where  an  input’s  adjacency  list  can  be 
generated  in  constant  time,  then  fast  highly  independent  hash  functions  will  follow. 
Similarly,  effective  procedures  for  constructing  weak  expanders  will  yield  even  better 
hash  functions. 

The  difficulty  with  the  constructions  presented  so  far  is  that  the  random  graphs 
G  require  a  huge  description  for  their  dn  edges.  Moreover,  the  problem  of  finding 
such  a  graph  seems  to  be  quite  difficult.  Fortunately,  Cartesian  products  can  be  used 
to  attain  compact  representations  of  less  efficient  hash  functions,  where  we  forgo  some 
randomness,  and  increase  our  0(1)  operation  count  to  an  exponentially  larger  constant. 
This  section  closes  with  a  simple  analysis  of  what  can  be  done  to  achieve  fast  highly 
independent  hash  functions  that  are  less  efficient  than  the  constructions  given  so  far, 
but  which  nevertheless  run  in  constant  time  and  are  spatially  compact.  These  variations 
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can  be  applied  to  either  formulation,  but  we  shall  restrict  our  attention,  for  the  most 
part  to  the  latter,  since  they  appear  to  be  more  efficient. 

Definition  6. 

Let  the  Cartesian  product  A  x  B  of  two  bipartite  graphs  A  —  (1,0,  E)  and  B  — 
( U,V,F )  be  the  graph  C  —  (W,  X,  H),  with  input  vertex  set  W  —lx  0 ,  output  set 
X  =  O  x  V,  and  edge  set  H,  which  contains  the  edge  from  (i,  u)  e  11  to  (o,  v)  e  X  if 
and  only  if  edge(i,  o)  e  E  and  edge(u,  v )  e  F. 

Lemma  6.  Let  G^  be  (n,  e,  d,  h  .(-weakly  triangular,  and  G2  be  (to,  e,  c,  h)- weakly  trian¬ 
gular.  Then  the  Cartesian  product  G\  x  G2  is  (rnn,  e,  cd,  /i)-weakly  triangular. 

Proof:  We  need  only  verify  the  weak  triangularity  property  for  G]  x  G2.  Let 
$1,  X2,  ■  ■  ■  Xj  be  an  arbitrary  set  of  /  <  h  distinct  inputs  in  [0,  n  —  1]  x  [0,  to  -  1],  Let  Xj 
have  the  Cartesian  product  representation  xt-  =  (z(i,\),  z(i,2)),  z(i,\)  e  G\,  z(i,  2)  e  G2. 
The  following  procedure  finds  a  permutation  that  is  upper  triangular  for  the  xt. 
Initialize  the  permutation  of  rows  and  columns  to  0; 

Mark  all  outputs  from  Gj  as  free; 

Assign  Zj  <—  { | sr ( z ,  1 )  =  z  for  some  i  e  [1,  /]}.  %By  construction,  Zj  is  nonempty. 

repeat 

Delete  a  z±  e  Zj  that  has  a  free  output  Oj  that  is  not  an  output  of  any  other  z  a  Zp. 

%This  can  be  done  because  Gj  is  weakly  triangular. 

Mark  Oj  as  not  free; 

Assign  /  <—  {a|a  e  [1,  /]  and  z(a ,  1)  =  Zj};  %A11  such  xa  agree  in  the  G\  coordinate. 
Mark  all  outputs  in  G2  as  free; 

Assign  Z2  <—  {z\z(i,  2)  =  z  for  i  e  /}; 

for  each  i  e  I  do 

Delete  a  z2  e  Z2  that  has  a  free  output  o2  that  is  not  an  output  of  another  z  e  Z2; 
Mark  o2  as  not  free; 

Assign  row  (zi,z2)  and  column  (o1,o2)  to  be  next  in  the  permutation 
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endfor 

until  Zj  is  empty; 
return  the  permutation. 

It  is  easy  to  see  that  the  algorithm  orders  the  nodes  in  an  upper  triangular  order.  | 
In  particular,  if  G  is  an  ( n€ ,  e,  d.  /i)-weak  expander,  then  the  Cartesian  product  G1/6 
is  (n,  e,  d1/6,  /t)-weakly  triangular. 

Combining  Lemmas  3,4,5,  and  6  with  suitable  rescaling  gives  gives  the  following. 


Theorem  2.  For  e  >  2*  +  h  k1  there  are  fixed  programs  of  size  0(e2d/  k)ne 

that,  for  each  input  from  [0,n*  -  1],  “Exclusive- Or”  together  </G'-  words  from  a  pool 
of  ne  random  (ATogra)-bit  words  to  compute  an  (h.  l)-wise  independent  hash  function 
mapping  [0,  nk  —  1]  i — ^  [0,  nk  -  1],  where  ne  is  a  power  of  2. 

Proof:  We  store  an  explicit  (nc,  e/&,  d,  h  )-weak  expander  G  as  part  of  the  hash 
function;  a  value  i  e  [0,  nk  -  1]  is  hashed  by  computing  the  adjacency  list  for  i  in  G*/e, 
and  applying  the  “Exclusive- Or”  to  the  random  words  so  probed  as  the  neighbors  of  i. 

It  only  remains  to  verify  the  existence  of  an  (ne,  t/k,  d,  /i)-weak  expander  for  the 
parameters  at  hand.  Computing  as  before  gives, 

(r^Jh\  (  (JT) 


Probffailure}  < 

l  <j<h 


nK 


J  )  \jd/2) 


<,  ^2(ri^€2d/2k(jde/2)d/2y /j\ 

j<h 

j<h 

<  ( e-(1+log  d+1°s  hW2(hde /2 f!2)3 / j\ 

j<h 


<j2(i/2yd/2/j\<i.  i 

j<h 

The  conditions  of  Theorem  2  can  be  simplified  to  yield,  for  example,  an  (ra^,l)-wise 
independent  family  of  hash  functions  on  [0,  nk  -  1],  when  e  —  ^  +  ^jk(S  +  The 

functions  can  be  evaluated  in  time  dk!e,  and  have  a  program  size  of  0(rae).  The  point 
of  this  construction  is  that  for  fixed  k  and  large,  slowly  growing  h ,  e  =  0(  j)  +  o(l)  as 
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n  — *  oo. 

For  completeness,  we  state  without  proof  the  analogous  composition  formulation 
for  the  first  construction. 

Lemma  7.  Let  G  be  an  (ne,  e,  d,  h  .(-weak  concentrator.  Then  the  Cartesian  product 
G1^  is  an  (n,  e,  d1/6,  h)-weak  concentrator.  | 

The  families  of  hash  functions  have  as  yet  been  “demonstrated”  only  in  a  prob¬ 
abilistic  sense;  no  explicit  constructions  have  been  given.  Formally,  (that  is,  up  to 
constant  factors)  this  distinction  is  moot.  By  increasing,  slightly,  the  degrees  of  free¬ 
dom  in  our  probabilistic  constructions,  the  same  counting  argument  will  ensure  that 
with  probability  1  - 1  /nr ,  a  randomly  selected  graph  is  a  weak  concentrator  or  expander. 
Accordingly,  we  may  simply  increase  the  size  of  the  hash  family  by  indexing  it  over  all 
graphs  satisfying  the  (modified)  size  and  degree  parameters  of  Lemma  5.  The  resulting 
randomized  construction  F^n,£,d^  is  an  explicit  family  of  0(1)  time  hash  functions  that 
is  essentially  (h)-wise  independent,  as  characterized  by  Definition  2. 

Lemma  8.  For  e  >  a  random  bipartite  graph  on  [0,  n—  1]  x  [0,  ne  -  1] 

with  outdegree  d  is  a  ( n ,  e,  d.  h)- weak  expander  with  probability  exceeding  1  - 

Proof:  We  apply  the  random  construction  used  for  Lemma  5,  but  include  the 
(algorithm)  simplifying  modification  that  each  input  vertex  receives  d  edges  selected  at 
random  with  replacement: 

Procure)  <  ^  G)  G dft) 

<s  '^2  {nl~id/2{jde /2)d/2y / j\ 

2  <j<h 

<i7F£(1/2)'<i/2/j!  <»-'•  I 
j<h 

Combining  Lemmas  3,4,  and  8,  substituting  nk  for  n,  and  ne/k  for  ne  gives  the 
following  characterization  of  the  family  F^n,e,d\ 

Theorem  3.  For  e  >  +  -|l  +  A'  1+1°g^gtKlog  h ,  F^n,£,d^  is  an  explicit  family  of  r-practical 
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(h,  l)-wise  independent  hash  functions  mapping  [0,n*  —  1]  i — [0,  k  -  1],  F^n,e’d-  has 
a  program  space  of  ^-ne  +  0(1)  (logn)-bit  words,  and  for  each  input  from  [0,  nk  -  1], 
computes  the  “Exclusive- Or”  of  dk!<-  members  in  a  pool  of  ne  random  ( k  log  n)-bit  words. 


The  requirement  for  e  is  readily  simplified  to  e  >  y  +  ,  H-  +  k  1+lo§  rf+loS  h . 


log  n 

Proof:  The  program  for  F ^  is  essentially  an  array  A  of  words  belonging  to 
[0,  rze2/*] .  The  d  edges  emanating  from  vertex  i  e  [0,ne  -  1]  of  G  are  found  in  locations 
A[l\,  for  di  <  l  <  (d  +  l)i.  Given  j  e  [0,  nk  -  1],  the  dk!'-  locations  among  the  n€  random 
words  are  found  by  expanding  the  edges  from  vertex  j  of  G1^  on  the  fly  in  0(dk!e) 
time.  The  program  for  this  expansion  requires  0(1)  space. 

Thus  it  suffices  to  verify  the  existence  of  an  (n€,  e/fc,  d,  h  )-weak  expander  for  the 
parameters  at  hand.  Computing  as  before  gives, 

k\  (  [Jdl2) 


Prob{failure}  <  ^  \  j  )  \jd/2)  ^ 


1 

£ 

V 

’  E  («'- 

1  <j<h 

<  n~r 

E  (»-'( 

1  <j<h 

<  n~r 

E  («-"■ 

1  <j<h 

<  n~T 

E  (!/2) 

1  <j<h 

Here  the  bipartite  graph  is  part  of  the  random  input,  whereas  before  one  good  graph 
was  shown  to  service  the  entire  family  of  hash  functions.  Thus  in  the  former  case,  an 
amortized  randomized  algorithm  might  require,  upon  rare  occasion,  new  random  seeds 
to  attain  a  better  family  member  for  the  current  data,  but  the  graph  would  last  forever; 
for  applications  of  Theorem  3,  the  new  hash  function  candidate  would  include  randomly 
selected  edges  for  a  new  random  graph  among  its  random  seeds. 

For  completeness,  a  rather  crudely  transparent  iterative  version  of  the  algorithm  is 
presented  below. 

function  Ranclom(i:  in  [0,n*  -  1]):  in  [0, nk  -  1]; 

Global  A:  n€  x  1  array  of  words  in  [0,  nk  —  1]; 
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Global  G\  rae  x  d  array  of  words  in  [0,ne2/*  -  1]; 

Local  in  [0,d-  1]; 

Local  *2s  -  ■  ■ ,  **/e:  in  [0,«e-  1]; 

Local  j:  in  [0,-ne-l]; 

Local  val:  in  [0,  nk]\ 

Assign  -  *; 

val  <—  0; 

for  li  •*-  0  to  d  -  1  do 
for  l2  <—  0  to  d  -  1  do 

for  <—  0  to  d  -  1  do 

Assign  j  «-  (G[*l5  /i],  G[*2, /•.<];••••  G[i  A/C5  )  J 

val  <—  (A[j]  XOR  r«/) 

endallfors;  %  Altogether,  dk/f  XORs  take  place. 

return(val). 

3.  A  lower  bound 

We  now  show  that  the  size  of  our  random  word  pool  cannot  be  materially  reduced  with¬ 
out  affecting  the  running  time  of  the  hash  function.  A  family  of  (h.  (i)- wise  independent 
hash  functions  where  fm  '■  SF->  S  will  be  modeled  as  follows.  Each 

fm  is  defined  by  the  same  algorithm,  which  inputs  x  and  then  reads  d  locations  in  an 
array  A[l..z],  that  contains  z  values  belonging  to  S.  Index  m  is  the  string  of  concatenat¬ 
ed  data  contained  in  A.  The  algorithm  can  even  be  viewed  as  probabilistic  since  values 
found  in  A  might  be  used  with  x  in  an  adaptive  search  to  determine  which  other  array 
locations  to  access.  These  values  and  x  are  then  used  deterministically  to  compute  the 
random  function  value  in  S.  Let  nL  =  n{n  -  1)(??  -  2) . . .  (n  -  j  +  1). 

Theorem  4.  Let  F 'm  =  {fm}meM  denote  a  family  of  (h,fi)~ wise  independent  hash 
functions  mapping  S  i— ►  S,  where  M  c  Sz.  Then  the  time  complexity  T  to  evaluate 
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/  G  Fj satisfies  either  T  >  h  or 

z3L>(h-2)—(\S\-n). 

Proof:  We  may  suppose  that  each  computation  of  /  examines  cl  entries  in  the 

array  A.  We  show  that  d  satisfies  the  constraint  for  T.  For  each  set  (  of  h  -  1  locations 

in  the  z  element  array  A,  we  partition  M  into  M =  (M^ ,  , . . . ,  where 

is  the  set  of  strings  in  M  that  equal,  on  (,  the  i-th  enumeration  of  a  hxed  ordering  of 

S'*-1.  Let  be  the  set  of  domain  elements  x  g  S  that,  when  computing  fm (x)  for 

m  G  ,  have  their  d  ,4- locations  read  from  within  (.  Let 

o  ,r  ' \  /  S(C,i)i  provided  \m£\  >  n\M\/\S\h] 

°U’  J  10  if  \M<\<v\M\/\S\h. 

Given  an  s  G  /m(s)  will  be  computed  by  probing  the  same  d-tuple  of  locations 

within  (  for  all  m  G  . 

There  are  (^f^)  subsets  (,  and  each  subset  induces  a  partition  of  M  indexed  by  the 
m-values  restricted  to  (.  Let  S  =  J2i  I |  |5'((,  i)|,  and  set  Sq  =  I |  |5'o(C,*)l- 
It  follows  that  |5o(C,  *)l  <  ^  since  otherwise  there  are  h  elements  in  S  that  hash  to  some 
tuple  with  probability  exceeding  n/\,S\h.  Hence  Sq  <  J2i{ h  ~  1)1  I  =  -  1)|M |, 

whence 

So<(A-l)(ft!1)|V|.  (2) 

On  the  other  hand,  each  m-string  in  M  will  be  probed,  for  each  s  G  S,  in  d  locations, 
which  means  that  the  pair  (s,m)  is  counted  exactly  {^Fltd)  times  in  E.  Hence 

s  <J> 

Finally,  each  s  G  S  may  be  able  to  encounter  each  of  the  \S\d  sequences  of  probe  val¬ 
ues  within  different  (  sets,  which  have  h-l—d  unprobed  locations  that  can  have 

up  to  different  assignments.  That  is,  any  s  £  S  belongs  to  at  most 

different  In  view  of  this,  we  can  count  that  E  -  Sq  <  J2i  *)l  - 

@1|S|(yih)lSl'‘-1.  whence 

s-s«<G W 
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Combining  equation  (3)  and  inequality  (4)  gives 

E°> 

Combining  inequalities  (2)  and  (5)  gives 

Eliminating  common  factors  establishes  that 

d 

—  ^  |h  |  —  /t-  | 


(5) 


(h- 2) 

Notice  that  when  h  random  probes  to  the  pool  are  allowed,  the  bound  on  d  collapses 

to  the  empty  requirement  z~  >  0.  Of  course  h  random  numbers  are  necessary  and 

£_j_  l°g  ft  <?(l) 

sufhcient,  in  this  case.  For  d  <  h,  at  least  d  =  — - -  probes  are  needed  per 

evaluation  of  an  (h)-wise  independent  hash  function  that  uses  a  database  of  z  —  n€ 
random  &log??-bit  words  to  map  [0,nfc  -  1]  i— ►  [0,  n*  —  1],  In  view  of  Theorem  1,  we 
conclude  that 


T  —  Q(k/e),  for  T  <  h. 

Restated,  we  have  a  time- log(  space)  tradeoff:  Tlog(Space)  >  log  (Range),  where  Space 
is  the  number  of  words  in  the  pool  of  random  words  (exhibiting  (W)- wise  independence) 
and  \og[Range)  is  the  word  size  of  the  domain,  range,  and  pool.  Moreover,  this  lower 
bound  and  tradeoff  applies  to  any  algorithm  with  any  level  of  precomputation,  for 
we  may  simply  view  any  internal  storage  and  precomputed  values  as  part  of  the  pool 
measured  by  z-. 

The  dependence  on  r,  for  r-practical  schemes  is  more  dramatic.  Our  constructions 
show  that  for  any  fixed  r,  linear  hash  functions  can  reduce  the  problem  from  a  domain 
of  size  S  to  one  of  size  nr+2,  provided  the  lookup  table  A  contains  random  words  from 

S. 


We  also  remark  that  the  counting  argument  for  Theorem  4  gives  an  average  case 
time  bound.  More  precisely,  let  T  <  h  be  the  bound  from  Theorem  4.  Then  the  time, 


avera; 


ged  over  all  items  in  5,  is  at  least  T  -  4r-( 


T-l 


T-2 


'5'V2)  — 

be  expressed  as  T  —  0(1)  when  &  >  ch  for  fixed  c  >  1. 


+ 


(A- 2) 


T-3 


+  . . .  +  z),  which  can 
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4.  Applications 

The  constructions  of  Section  3  show  that  (h)-wise  independent  hash  functions,  for  non- 
constant  h  <  nb  and  sufficiently  small  constant  6  >  0,  can  actually  be  programmed  as 
constant  time  subroutines  that  require  only  a  moderate  size  pool  of  random  numbers 
as  input.  Thus  we  have  established  the  computational  feasibility  of  any  probabilis¬ 
tic  algorithm  that  has  a  performance  bound  based  exclusively  upon  the  use  of  such 
functions. 

The  two  examples  cited  in  this  section  are  by  no  means  self-contained.  The  first, 
which  concerns  the  performance  of  double  hashing,  follows  from  an  elaborate  proof  [16] 
based  on  (O(log  n))-wise  independence.  Consequently,  Corollary  1  follows  trivially.  Yet 
even  the  performance  bounds  for  full  independence  [7]  are  subtle  and  educational,  and 
it  is  still  not  clear  if  the  elegant  proof  technique  of  [7]  can  be  translated  into  a  proof  for 
limited  independence. 

The  second  application,  which  concerns  the  pipelined  emulation  of  an  idealized 
ralogra  processor  parallel  machine  on  an  n  processor  real  machine,  requires  simple  mod¬ 
ifications  of  the  original  construction  [13],  which  is  based  upon  (Oflog  ?z))-wise  inde¬ 
pendence.  The  original  algorithm  is  elegant  but  sufficiently  elaborate  that  we  only 
present  the  changes.  In  both  applications,  the  original  references  are  necessary  and 
recommended  for  a  complete  understanding  of  the  results. 

Corollary  1.  For  fixed  load  factor  a  <  1,  O(log  ra)-wise  independent  hash  functions 
can  be  used  for  double  hashing  with  constant  expected  probing  for  unsuccessful  search. 

I 

ft  should  be  noted  that  the  [16]  result  only  needs  O(logn)-wise  independent  hash  func¬ 
tions  that  map,  say,  [0,  n4]  h-  [0,  n  -  1]. 

Randomized  routing  schemes  and  PRAM  emulation  have  had  a  substantial  and 
fruitful  recent  literature  [21],  [17],  [2],  [12],  [18],  [19],  [5],  [13].  In  particular,  [5]  and 
[13]  show  formally  (and  perhaps  plausibly)  how  n  log  n-processor  Omega-like  networks 
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can,  with  very  high  probability,  emulate  an  n  log  n-processor  PRAM  with  an  optimal 
performance  penalty  that  is  “only”  a  multiplicative  factor  of  log??. 

Both  Karlin  and  Upfal  [5]  and  Ranade  [13]  presented  schemes  for  an  n  log  n-processor 
emulation  of  n  log  n-processor  PRAM  algorithms.  The  processors  are  interconnected  on 
an  n  x  logn  Omega  network.  For  this  configuration,  no  pipelining  is  possible  with  a 
model  featuring  100%  randomized  memory  references,  because  each  PRAM  emulation 
step  causes  the  network  to  be  effectively  saturated  for  O(log??)  time.  Thus,  their  feasi¬ 
bility  results,  which  were  based  on  hash  functions  comprising  log??  degee  polynomials 
sustain  no  performance  penalty  for  evaluating  such  a  polynomial  for  each  memory  ref¬ 
erence.  Given  the  log  ??  performance  cost  for  referencing,  Karlin  and  Upfal  did  not  need 
to  address  the  much  less  significant  issue  of  what  to  do  about  hashing  collisions  at  the 
memory  cell  level;  it  simply  cannot  be  a  problem  when  O(log??)  time  is  available  to 
locate  each  item.  Ranade  [13]  mentions  the  issue  and  shows  that  a  scheme  using  log?? 
reads  per  fetch  readily  solves  the  problem:  his  solution  is  to  specify  the  location  of  items 
by  their  row  number  (which  is  in  [0,??  -  1])  and  the  cell  address  of  their  module,  but 
modulo  the  log??  modules  in  a  row  of  an  ??  x  log??  Omega  network. 

Now  that  highly  independent  hash  functions  can  be  evaluated  in  constant  time, 
it  is  natural  to  reexamine  these  models  to  see  if  optimal  speed-up  can  be  achieved  by 
pipelining  these  algorithms  on  machines  that  feature  a  reduced  ratio  of  processor  density 
to  routing  capacity.  The  idea  of  pipelining  that  exploits  large  scale  parallel  slackness 
to  mask  network  latency  can  be  traced  to  Smith  [S- 78] ,  and  has  also  been  a  subject  of 
theoretical  study  in  [11]  and  [20]. 

It  is  a  simple  matter  to  adapt  the  [5]  and  [13]  constructions  to  emulate  an  ??log?? 
PRAM  machine  on  a  machine  having  one  column  of  ??  processors  interconnected  by  an 
??  x  log??  Omega  network.  A  PRAM  step  of  ??log??  parallel  instructions  is  emulated  by 
executing  log  ??  of  the  instructions  in  a  pipeline  of  each  processor. 

The  machine  would  also  have  n  memory  modules,  say,  one  per  processor.  Ranade’s 
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Common  PRAM  emulation  scheme  applies  with  a  few  simple  modifications:  First,  the 
hash  function  would  still  be  used  to  map  the  PRAM  address  x  g  [0,  nk  —  1]  into  [0, log?? - 
1]  x  [0,??  -  1]  x  The  data  packets  are  always  kept  locally  lexicographically 

sorted  (with  the  value  x  used  to  break  and  disambiguate  ties).  The  first  field,  which, 
in  Ranade’s  scheme,  designated  the  column  number  of  the  destination  module,  is  still 
used  for  the  sorting  of  packets,  but  has  no  meaning  in  this  case  since  only  one  column  is 
active.  The  local  process  number  (in  [1 ,  log  ??])  for  each  packet  might  be  explicitly  listed 
in  a  separate  held.  The  first  phase  of  the  algorithm  requires  each  processor  to  provide 
its  data  in  sorted  order  to  the  next  stages  as  appropriate.  This  preprocessing  can  be 
done  simply  by  following  Ranade’s  approach:  each  processor  uses  its  row  of  switches  as 
a  systolic  bubble  sorter  for  its  packets.  The  “column”  numbers  cannot  be  arbitrarily 
set  to  the  local  process  number  for  emulation  of  the  Common  PRAM,  since  combining 
would  not  be  adequate  to  guarantee  that  only  ()( log  n )  messages  would  arrive  at  each 
memory  module,  per  PRAM  step.  Such  a  scheme  would,  however,  be  adequate  for  an 
ER.EW  emulator,  and  in  this  case  the  bubblesort  can  be  skipped. 

Now  we  can  attain  optimal  speedup  in  emulation  mode: 

Corollary  2.  A  T-time  ??log??  processor  PRAM  algorithm  with  nk  words  of  shared 
memory  can,  with  high  probability,  be  emulated  on  a  pipelined  n  processor  ??  x  log?? 
Omega  network  in  time  0(Tlog??). 

Proof:  The  only  issue  to  address  concerns  memory  contention  at  the  cell  and 
module  levels. 

We  first  observe  that  the  nk  data  are  indeed  well  distributed  among  the  n  modules. 
From  (/?,/?)-wise  independence,  we  have  that  the  expectation  of  the  /?-th  moments  of 
the  data  counts  apportioned  among  the  modules  is  within  a  factor  of  //  of  the  fully 
random  case:  Formally,  let  rri  j(f)  be  the  number  of  items,  among  the  nk  data,  that  are 
mapped  to  module  i  by  the  hash  function  /,  for  i  =  1,2, ...,??.  Then  Ey[(™*)]  = 
E/[E  all  size  h- subsets  of  data  Probfthe  subset  hashes  to  module  ?}]  <  ( \ ){i/nh.  Hence 
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/  Tl  ™  \ 

E[(T)]  —  I1  V  •  We  use  this  inequality  in  a  Chebyshev  bound:  Prob{ra8  >  *  -| -h}  — 


Piob((7)  >  =  Probl  <iL-  >  1)  <  E[-a_]  <  ,  Qj.  <  Thus 

\  h  )  \  h  )  n  \  h  ) 

Prob{maXj(mj)  >  qn*-1  +  h}  <  ^ .  Taking  h  =  O(logn)  sufficiently  large  and  fixed 
7  >  1  gives  a  polynomially  small  probability^  that  7 nfc_1  items  are  hashed  to  any  of  the 


C  nk~1  +  h)J  —  nfc(T'tt*“i+ft) 


— 1 1  7  <  Thus 

\  h  ) 


n  modules. 


We  may  suppose  the  aggregate  storage  capacity  of  the  n  modules  is  (7  +  l)u*  (multi- 
field)  words.  For  simplicity,  we  may  assume  that  the  data  is  stored  via  bucket  hashing 
with  separate  chaining,  and  that  7  is  large  enough  to  accommodate  this  scheme  trivially 
(say  7  =  3).  Thus  the  global  address  space  of  a  module  is  size  nfc_1,  and  each  module 
can  store  7 nk~l  chained  elements  in  storage  external  to  its  formal  table  space. 


In  bucket  hashing,  an  item  hashed  to  a  given  location  can  be  found  in  time  propor¬ 
tional  to  the  number  of  items  hashed  to  the  same  address,  since  these  colliding  items 
are  stored  in  a  linked  list.  Thus  the  time  required  to  satisfy  r  references  to  a  single 
module  is  proportional  to  the  sum  of  the  list  lengths  for  the  r  locations. 


For  the  pipelined  emulation  of  a  single  PRAM  step,  we  can  measure  delays  due  to 
local  collisions  by  a  number  of  random  variables,  which  get  their  randomness  entirely 
from  the  hash  function  used  to  translate  variable  names  into  address  locations.  Let,  for 
a  single  logn  deep  superstep  that  batches  together  one  n  log  n-  way  PRAM  operation, 
nt  be  the  number  of  memory  references  to  module  i.  for  i  =  1,2,  ...,n.  Let  /,  be  the 
sum  of  the  list  lengths  of  the  locations  referenced  in  module  i.  Then  the  portion  of  the 
running  time,  for  the  single  step,  that  is  due  to  local  processing  within  each  module 
module  is  simply  /,,,  and  the  maximum  of  these  random  variables  measures  the  intrinsic 
delay  due  to  local  retrievals.  It  is  easy  to  use  a  buffer  to  sequence  the  return  of  data  at 


times  consistent  with  Ranade’s  original  emulation  algorithm,  provided  sufficient  delay 
is  introduced  to  ensure  that  all  internal  processing  is  successfully  completed,  with  high 


value  is  polynomially  small  if  it  depends  on  parameters  that  can  be  set  so  that  it  is  less  than 
for  any  fixed  c  and  sufficiently  large  n. 
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probability.  The  delay  has  two  parts,  one  due  to  routing,  which  is  O(log  n)  in  size  with 
very  high  probability  [13],  and  a  second  due  to  local  access  of  multiple  items  hashed  to 
module  i. 

We  shall  declare  the  pipelined  emulation  to  fail  at  any  step  where  nt  >  (3  log  n.  or 
li  >  4/3  log  n.  Now, 

Prob{max(n?)  >  ,9 log  n  V  max(Z,-)  >  A/3 log  n}  <  n  x  Prob{??i  >  /3logn } 
i  '  i 

+n  x  Probfrtj  <  ,5 log  n  Al\>  Ad  log  ?t}. 

Using  wise  independence  with  h  >  5  log  ra,  we  calculate  that  among  a  batch  of 

v  <  n log 'ii  memory  references,  the  probability,  that  at  least  8 log??,  of  them  will  be 
hashed  to  (bucket)  locations  within  memory  module  1,  is  bounded  by  the  expected 
number  of  such  (/?log  n)- subsets: 


This  probability  is  superpolynomially  small  in  n  for  fixed  f3. 

We  may  use  (4/S  log  n,  //)-wise  independence  to  overestimate  Probf/j  >  A/3log?iAni  < 
f3\ogn]  very  crudely  as  the  expected  number  of  pairs  (Si,^)  where  Si  is  a  set  of 
j  <  ,9  log  n  references,  among  the  actual  v  <  ?zlogn  memory  references,  that  hash  onto 
module  1,  and  S2  is  a  set  of  38  log  n  elements,  among  the  nk  —  v  unreferenced  items, 
that  hash  into  the  hashing  image  of  Si- 

Probf/j  >  4/UognAn!  <  (3\ogn)  <  /J  £  (nl°gn)  Q^Lgn)^3^08" 


(log  BVj3/3l°g„ 
j\(3/3\ogn)\ 


<  lJ 


(log  n)^log  n(f3log  n)3^log  5 
(/Slog  u)!(3/Slog  n)! 


This  probability  is  polynomially  small  in  n.  Hence  Probfmaxd??/)  >  (3  log  n  V  max,  ( 1  i)> 
A/3  log  77}  is  polynomially  small. 

Choosing  suitably  large  constant  [3  and  h  =  A/3  log  n  gives  the  desired  performance 
bounds.  | 
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Double  hashing  (c.f  [16])  provides  a  formally  simpler  hashing  method  with  essentially 
the  same  performance. 

A  formulation  comparable  to  Corollary  2,  conditioned  on  the  existence  of  suitable 
hashing  algorithms  and  corresponding  hardware,  was  recently  given  in  [20].  Versions 
of  the  basic  counting  estimates,  with  the  exception  of  the  bound  on  the  aggregate 
number  of  collision  items  encountered  by  a  batch  of  references  queued  at  one  memory 
module,  can  be  found  in  [11]  along  with  some  early  analysis  of  pipelining  and  various 
hashing  schemes.  It  should  also  be  noted  that  the  Fast  Fourier  Transform  can  be  used 
to  evaluate  k  evaluations  of  a  degree  k  polynomial  in  A; log  k  time  (c.f.  [1]).  Thus  it 
is  possible  to  use  the  above  pipeline  strategy  on  n  processors  with  log?7  degree  hash 
functions  to  attain  a  performance  cost  of  (log log n)2  operations  per  memory  reference 
rather  than  a  naive  log  n.  We  have  shown  that  this  multiplicative  performance  penalty 
can  be  reduced  to  0(1). 

5.  Conclusions 

Real  machines  have  significant  amounts  of  memory.  We  have  shown  how  to  exploit 
this  capacity  to  store  a  sublinear  sized  database  of  random  words  in  local  memory  to 
define  highly  independent  hash  functions  that  can  be  evaluated  in  constant  time.  For  the 
development  of  probabilistic  algorithms  and  the  use  of  large  scale  parallel  machines,  this 
capability  has,  at  least,  theoretical  importance.  We  have  also  shown  that  such  functions 
have  an  intrinsic  tradeoff  between  their  evaluation  time  and  the  storage  reserved  for 
precomputed  data  (or  their  amortized  evaluation  time  and  the  space  reserved  for  active 
storage). 

The  high  independence  exhibited  by  our  hash  functions  enriches  the  class  of  prob¬ 
abilistic  algorithms  that  can  be  shown  to  achieve  their  expected  performance  in  real 
computation.  Proofs  need  not  be  restricted  to  h-wise  independence  for  constant  h.  and 
probability  estimates  can  use  the  probabilistic  method  to  calculate  the  expected  number 
of  /i- t  uples  satisfying  various  behavior  criteria. 
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It  is  worth  noting  that  the  fast  hash  functions  described  in  this  paper  are  not  really 
necessary  for  pure  routing  problems.  After  all,  if  an  adequately  random  assignment  of 
intermediate  destinations  provides,  with  very  high  probability,  nearly  optimal  perfor¬ 
mance  in  a  Valiant-Brebner  style  of  routing  [21],  then  the  same  destinations  could  be 
used  for  many  consecutive  routings. 

What  these  fast  hash  functions  really  provide  is  nearly  uniform  mappings  of  data 
to  modules  and  cell  locations  and  a  convenient  way  to  assert  that  with  high  probability, 
no  step  in  an  nk  emulation  sequence  takes  more  than  O(logn)  time  to  complete.  Thus, 
fast  hash  functions  are  even  important  for  fast  deterministic  routing  schemes,  if  large 
amounts  of  data  have  to  be  stored  in  a  randomized  manner.  In  addition,  hash  functions 
computed  from  destination  addresses  provide  a  way  for  common  memory  references  to 
be  fully  combined  en  route  in  Ranade’s  simple  queue  management  scheme,  and  this  is 
important  if  combining  is  required  to  avoid  hot  spot  contention. 

From  such  a  perspective,  this  work  gives  a  theoretical  foundation  for  the  very  prag¬ 
matic  use  of  Memory  Management  Units.  This  paper  gives  a  formal  proof  that  such 
organizations  work  well  in  pipelined  environments  for  a  model  of  computation  that  is 
feasible  and  “only”  a  constant  factor  slower  than  methods  used  in  practice. 

From  a  more  abstract  perspective,  we  have  exposed  a  very  close  equivalence  between 
the  true  space-time  computational  complexity  of  (h)- wise  independent  hash  functions 
and  single  instances  of  bipartite  graphs  on  [0,  n  —  1]  x  [0,  rae]  that  have  low  input  degree 
d  and  have  good  expansion  properties  for  small  vertex  sets.  A  spatially  compact  graph 
representation  that  can  be  used  to  compute  the  adjacency  list  of  an  input  vertex  in 
time  Tq  =  cd  gives  a  time  /  j  =  Tq  hash  function  with  a  high  degree  of  independence, 
when  augmented  with  a  pool  of  n€  random  numbers.  Similarly,  a  family  of  cd  highly 
independent  hash  functions  gives  such  a  graph  with  Tq  =  all 'j .  albeit  with  an  additive 
spatial  cost  of  tdne  for  the  random  numbers.  It  is  worth  remarking  that  the  equivalence 
holds  in  this  direction  because  our  probability  estimates  in  Section  2  were  calculated 
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from  h- way  expectations,  and  never  used  full  independence.  The  resource  blowup  is  the 
modest  factor  ed  because  a  random  function  value  in  [0,  n  - 1]  gives  f  points  in  [0,  n(  -  1] . 
A  crude  application  of  our  lower  bound  imposes  the  requirement  that  tl  >  1/e,  while 
our  hash  function  construction  gives  sufficiency  with  d  —  2/e  +  1. 

The  most  significant  open  question  is  how  to  find  good  weak  expander-like  graphs 
that  are  defined  by  short  efficient  programs.  The  discovery  of  such  an  object  might 
have  a  very  beneficial  effect  on  the  practicality  of  such  a  class  of  functions. 

Acknowledgements 

The  author  thanks  J.P.  Schmidt  for  stimulating  discussion. 

Appendix  1 


Fact  1:  Let  Pk  —  {p  \  p  is  prime  and  p  e  (nk  log  to,  (2  +  (3)nk  log  to  )},  for  some  small 
suitably  fixed  3  >  0.  Then 

3x  7^  y  g  D  :  ProbpgPj _  {x  —  y  mod  p]  <  n~k . 

Proof:  [9], [4]  By  the  Prime  Number  Theorem,  \Pk\  =  ( \  —  o(  1 ) ) , 

whence  fewer  than  l/nk  of  the  elements  of  Pk  can  divide  \x  -  y  |,  since  Y[pePk  P  > 
(n*logm)]-p*l  >  (to)”*.  | 


Fact  2:  Let  Fo[p)  =  {h  \  h(x)  —  (ax  pb  mod  p)  mod  nk,  a  p-  0,  b  e  [0 ,p  -  1]},  where 

p  >  nk  is  prime.  Then 


Vx^ye  [0,p-  1]  :  Prob feFo[p}{f(x)  =  f(y)}  <  n 


-k 


Proof:  [3]  Given  x  and  y.  x.  y  c  [0,p—  1],  x  p  y,  the  number  of  different  /  e  Fo(p) 
where  f(x)  =  f(y).  is  precisely  the  number  of  2  x  2  linear  systems  in  a  and  b\ 

(  ax  +  b  =  c  +  dnk  mod  p  c  +  dnk<p.  c<nk.  e^.  c  +  enk<p_ 

{  ay  +  o  =  c  +  err  mod  p 

Now  c  +  dnk  can  have  p  different  values.  The  remaining  parameter  e  cannot  be  set  to 
d  because  this  would  give  a  =  0.  Thus  there  are  at  most  \p/rP  -  1]  different  values 
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available  for  e.  Since  there  are  p(p-  1)  different  functions  in  Fq,  and  f(x)  =  f(y)  for  at 
most  p\p/n k  —  1]  <  p^fpr  °f  them,  the  result  follows.  | 

Combining  Facts  1  and  2  shows  that  a  hash  function  selected  at  random  from 
=  UpepkFo(p)  will,  with  probability  exceeding  1  -  2(^ )n~k,  map  s  items  from  D  into 
[0,  nk]  with  no  collisions  at  all  among  its  (2)  pairs.  We  could  take  k  —  4,  so  that  the 
probability  of  a  collision  is  below  1/n2,  and  assume  the  functions  F^h)  are  defined  in 

( 1 )  for  p  «  n4 . 
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